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ABSTRACT 


Existing  literature  suggests  that  the  hearing  mechanism  deals 
with  incomirg  speech  material  by  filtering  the  signals  into  a  series 
of  frequency  bands.  The  width  of  these  bands  has  been  referred  to  as 
the  critical  band  that  is  the  perceptual  frequency  bandwidth  observed 
in  a  variety  of  psychoacoustic  contexts.  Digital  processing  techniques 
have  been  developed  for  altering  available  recorded  speech  materials 
so  that  the  frequency  resolution  available  in  the  resultant  stimuli 
may  be  controlled.  Tapes  have  been  produced  wherein  the  frequency 
bandwidth  resolution  is  limited  to  no  better  than  one  critical  band  and 
these  tapes  have  been  used  in  intelligibility  testing.  Some  existing 
research  indicates  that  the  critical  band  is  significantly  widened  in 
many  individuals  with  sensorineural  hearing  loss  of  cochlear  etiology. 
The  digital  processing  routines  described  above  were  also  used  in 
developing  tape  recorded  materials  with  bandwidth  resolution  limits 
considerably  wider  than  the  normal  critical  band.  The  bandwidths 
chosen  for  this  stage  of  the  digital  processing  were  based  on  empirical 
observations  of  the  critical  band  of  sensorineural  hearing  impaired 
patients.  These  recordings  were  also  used  in  intelligibility  testing 
with  normal  listeners.  Implications  of  these  studies  for  the  clinical 
measurement  of  speech  intelligibility  will  be  discussed. 
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Chapter  I 


BACKGROUND 


Brief  Physiology  Overview 

Detailed  descriptions  of  auditory  anatomy  may  be  found  in 
such  sources  as  Fex  (1962)  ,  Goldstein  (1968) ,  Milner  (1970) ,  Dallos 
(1973)  and  Spoendlin  (1973).  The  portion  of  the  inner  ear  primarily 
responsible  for  the  hearing  process  is  called  the  cochlea.  This 
coiled,  membraneous  structure  contains  three  compartments,  or  scalae, 
separated  lengthwise  along  the  human  cochlea’s  two-and-three-quarter 
turns.  The  scala  media  houses  the  Organ  of  Corti,  whose  complex  set 
of  receptor  cells  serve  as  the  locality  for  conversion  of  continuous 
acoustic  waves  into  neural  impulses.  An  acoustic  input  arriving  at 
the  oval  window  via  the  ossicles  gives  rise  to  transverse  and 
longitudinal  waves  in  the  cochlea’s  endo-  and  perilymphatic  fluids, 
respectively.  Wave  propagation  in  the  cochlea  is  essentially  a  case 
of  wave  propagation  in  a  shallow  fluid  of  nonuniform  depth 
(i.e.,  similar  to  ocean  waves  approaching  a  beach).  The  scala  tympani 
and  the  scala  vestibuli  become  gradually  "shallower"  towards  the 
apical  end.  The  point  at  which  the  transverse  wave  at  the  fluid 
interface  of  these  two  scala  (i.e.  scala  media  duct)  will  crest  is 
dependent  on  its  frequency  (Dallos,  1973).  High  frequency  waves 
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crest  near  the  basal  end  (far  from  "shore”)  while  low  frequency  waves 
consequently  crest  close  to  the  apical  end.  Maximal  dissipation  of 
the  waves'  energy  occurs  at  this  location.  The  inner  and  outer  hair 
cells  of  the  Organ  of  Corti  are  embedded  in  the  basilar  membrane. 
These  hair  cells  have  cilia  (hairlike  projections)  which  are  rooted 
in  a  surface  plate  and  the  ends  of  these  cilia  either  float  freely 
in  the  endolymph  or  course  into  the  opposite  tectorial  membrane. 

Wave  motion  causes  the  cilia  to  undergo  shear,  inducing  the  corre¬ 
sponding  hair  cell  to  trigger  the  neuronal  fibers  synapsed  at  its 
base.  It  should  be  noted  that  the  hair  cells  existing  at  the  wave's 
crest  point  will  undergo  maximal  shear  and  thus  relay  the  most  neural 
information.  Thus,  the  hydrodynamic  construction  of  the  cochlea 
combined  with  the  morphology  of  the  Organ  of  Corti  yields  a 
preliminary  place-specific  frequency  analysis  of  the  acoustic  signal. 

Bipolar  cells  synapse  at  the  base  of  the  hairs  with  neurons 
that  carry  frequency,  phase,  and  amplitude  information  towards  the 
brain  stem.  The  collection  of  the  cell  bodies  of  these  neurons  is 
called  the  spiral  ganglion,  located  in  the  modiolus.  The  subsequent 
collection  of  the  axons  proceeding  to  the  brainstem  make  up  the  bulk 
of  the  auditory  nerve  (VIII).  As  the  information  travels  to  the 
cortex,  it  is  relayed  to  the  superior  olives  by  the  dorsal  and  ventral 
cochlear  nuclei!.  From  superior  olives  information  is  fed  to  the 
inferior  colliculi  via  the  lemniscal  pathways.  Information  is  then 
transmitted  via  the  brachium  of  the  inferior  colliculus  to  the  medial 
geniculate  bodies  of  the  thalmus  and  then  via  auditory  radiations  to 
the  temporal  cortex.  Decussations  take  place  at  the  level  of  the 
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dorsal  cochlear  nucleus,  the  superior  olives,  and  the  inferior 
colliculus.  The  first  stop  for  an  afferent  message  arriving  at 
cortex  is  Broadman  areas  41  and  42  (auditory  area  A-I  of  Woolsey 
and  Walzl,  1942)  of  the  temporal  lobe,  located  anterior  to  the 
calcarine  fissure. 

The  existence  of  an  efferent  auditory  pathway  was  discovered 
near  the  turn  of  the  century  by  Ramon  Y  Cnjal  (1396)  and  elucidated  by 
Lorente  de  No  (1937).  Originating  in  the  posterior  cephalad  of  the 
diencephelon,  the  downward  coursing  fibers  experience  much  the  same 
afferent  relays  in  the  opposite  order  (Fex,  1962).  Rasmussen  (1946) 
made  a  highly  detailed  account  of  that  portion  of  the  efferent 
pathway  progressing  from  the  superior  olives  to  the  cochlea,  naming 
it  the  olivo-cochlear  bundle.  The  most  peripheral  part  of  this  tract 
branches  and  spirals  to  distribute  itself  peripherally  to  all  turns 
of  the  Organ  of  Corti.  Subsequent  research  by  Fernandez  (1951)  noted 
that  the  branched  fibers  innervate  a  diffuse  group  of  inner  and 
outer  hair  cells. 

The  Critical  Band  Phenomenon 

Existing  literature  on  audition  theory  suggests  a  heavy 
reliance  upon  the  notion  of  critical  bands  (e.g.,  Scharf,  1970). 

Most  simply,  a  critical  band  may  be  conceived  as  an  internal  bandpass 
filter.  In  the  initial  conception  of  critical  bands  (Fletcher,  1940), 
the  auditory  system  was  theorized  as  consisting  of  a  fixed  bank  of 
about  twenty-four  critical  bands  laid  end  to  end,  covering  the 
audible  frequency  range.  By  contrast,  current  notions  view  critical 
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bands  as  variable  filter  elements  centered  upon  a  particular  signal 
frequency  (cf.  Scharf,  1970).  The  critical  band  has  been  defined 
empirically  as  "that  bandwidth  at  which  subjective  responses  rather 
abruptly  change"  (cf.  Scharf,  1970,  p.  159).  In  general,  two  stimuli 
separated  in  frequency  by  less  than  a  critical  bandwidth  will  interact 
in  one  of  a  number  of  ways,  while  two  stimuli  separated  by  more  than 
a  critical  bandwidth  will  not.  Changes  of  listener  response  due  to 
the  critical  band  phenomenon  have  been  observed  in  such  perceptual 
phenomena  as  masking,  loudness,  and  musical  consonance. 

Masking  and  the  Critical  Band.  A  condition  of  masking  is 
said  to  be  in  effect  when  a  temporary  loss  of  sensitivity  to  a 
stimulus  occurs,  caused  by  a  simultaneous,  ipsilateral  presentation 
of  another  stimulus  (Moore,  1977).  By  and  large,  a  masking  stimulus 
is  most  effective  in  hiding  a  given  signal  whenever  the  frequency 
content  of  the  two  signals  is  similar  (Scharf,  1970).  Under  this 
context,  then,  a  critical  band  may  be  defined  as  that  masker 
frequency  region  wherein  the  masking  of  a  given  stimulus  is  most 
effective  (Scharf,  1970).  Energy  concentrations  in  regions  larger 
than  the  critical  band  of  the  test  signal  demonstrate  less  efficient 
masking  ability.  A  masker's  efficiency  may  be  defined  as  that  amount 
of  masking  energy  needed  to  provide  a  given  threshold  of  masking.  In 
a  psychoacoustic  context,  a  masking  source  is  most  efficient  when  its 
bandwidth  lies  within  the  critical  band.  Thus,  the  critical  bandwidth 
mechanism  seems  to  provide  a  resolution  capability  to  the  auditory 
system  whose  limit  is  reflected  in  its  ability  to  mask  signals. 


Loudness  and  the  Critical  Band.  Concomitant  to  the  limits  of 

masking  bandwidth  is  the  way  a  listener  perceives  the  intensity  of 

noise  bands  as  a  function  of  frequency  bandwidth  (Af) .  Studies  by 

Zwicker  and  Feldtkeller  (1955)  and  Zwicker  (1958)  showed  a  significant 

relationship  between  the  bandwidth  of  the  stimulus  and  the  point  at 

which  a  listener  hears  a  change  in  loudness: 

.  .  .  the  loudness  of  a  subcritical  complex  sound  of 
invariant  intensity  is  largely  independent  of  Af — 
it  is  about  as  loud  as  an  equally  intense  pure  tone 
lying  at  the  band's  center  frequency.  Only  when  Af 
exceeds  the  critical  band  does  the  loudness  of  the 
complex  begin  to  increase  (Scharf,  1970). 

The  process  of  integrating  incoming  sound  stimuli  for  loudness 

perception,  then,  appears  to  operate  through  the  filtering  effect 

of  the  critical  bandwidth  mechanism. 

Musical  Consonance.  Esthetic  tests  which  rate  the 
pleasantness  of  two-tone  complexes  have  provided  another  setting 
under  which  the  critical  band  phenomenon  may  be  observed.  Listeners 
were  asked  to  rate  the  consonance  of  a  pair  of  tones  on  a  seven-point 
"pleasantness"  scale.  The  overriding  judgments  of  consonance  were 
found  at  tonal  separations  of  more  than  a  critical  band.  A  rapid 
decrease  in  the  rating  occurred  as  the  tones  moved  to  a  separation 
narrower  than  a  critical  band  (Plomp,  1964;  Greenwood,  1961). 

These  findings  psychoacoustically  demonstrate  a  phenomenon  musicians 
have  understood  for  centuries;  consonance  peaks  occur  at  the  common 
harmonic  ratios — the  fourth,  fifth  and  octave  (Plomp  and  Levelt,  1965). 
The  relationship  of  these  ratios  to  the  critical  band  phenomenon  in 
particular  further  demonstrates  the  universality  of  this  effect  in 
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hearing. 

Critical  Bands  and  Speech.  Speech  is  the  most  pervasive 
and  important  acoustic  stimulus  for  the  human  listener.  Evidence 
suggests  that  the  critical  band  may  serve  in  the  analysis  of  speech 
(cf.  Scharf ,  1970) .  The  specific  task  of  attempting  to  measure  the 
critical  bandwidth  used  in  the  analysis  of  speech  sounds  by  the 
auditory  system  has  been  indirectly  approached  in  the  work  of 
French  and  Steinberg  (1947).  The  twenty-four  critical  bands  found 
in  pure  tone  psychoacoustic  studies  were  found  to  match  very  closely 
to  the  twenty-four  bands  which  contributed  equally  to  speech 
intelligibility.  In  summarizing  other  research  on  critical  bands, 
one  finds  two  functional  aspects  of  critical  bands  which  appear  to 
play  an  important  role  in  the  perception  of  speech  and  hence, 
represent  fundamental  aspects  of  a  listener's  performance.  As 
implied  in  the  works  of  Fletcher  (1940),  Zwicker  (1958),  and 
Greenwood  (1961)  ,  the  critical  band  serves  to  band  limit  background 
noise.  The  narrower  the  pass-band  of  the  ear  as  a  filter,  the  more 
noise  the  ear  can  reject,  making  it  more  tolerant  to  lower  signal-to- 
noise  ratios.  Thus,  a  listener  may  be  able  to  correctly  perceive  a 
spoken  communication  despite  background  noise,  simply  because  much  of 
the  energy  associated  with  the  noise  lies  outside  the  critical  bands 
surrounding  the  formant  frequencies  of  the  speech. 

Secondly,  our  ability  to  discriminate  the  harmonic  content 
of  complex  signals  (one  of  the  many  cues  used ,  for  instance,  in 
speaker  identification)  is  similarly  related  directly  to  the  critical 
band  phenomenon.  Plomp  (1964)  has  demonstrated  that  listeners  are 
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able  to  discriminate  those  partials  of  a  complex  tone  which  lie 
more  than  a  critical  bandwidth  apart.  Morton  and  Carpenter  (1963) 
found  that  formants  can  be  identified  by  listeners  even  when 
no  prominent  energy  peak  is  present  as  long  as  the  most  intense 
harmonics  associated  with  each  formant  are  separated  by  at  least 
a  critical  bandwidth.  Synthetic  vowels  presented  to  listeners 
by  Remez  (1977)  showed  an  abrupt  changeover  from  speech-like  to 
non-speech-like  sounds  as  the  formant  bandwidth  increased  to 
greater  than  a  critical  bandwidth.  Preliminary  analysis  with 
reference  to  the  critical  bandwidth  phenomenon  indicates  that 
this  mechanism  seems  to  be  working  for  speech  analysis  on  a 
peripheral  basis. 

Two  functional  characteristics  of  critical  bands  have  been 
briefly  reviewed  (noise  band  limiting  and  harmonic  discrimination) . 

It  is  argued  from  these  effects  that  critical  bands  play  an  important 
role  in  the  correct  perception  of  complex  acoustic  stimuli,  such 
as  speech. 

A  Mechanism  for  Critical  Bands.  A  selective  inhibition  of 
frequency-specific  afferent  auditory  messages  by  the  efferent  olivo¬ 
cochlear  bundle  may  serve  as  the  neuronal  basis  for  the  critical  band 
phenomenon.  This  gating  effect  by  neuronal  inhibition  has  been 
clearly  demonstrated  to  exist  (cf.  Rasmussen,  1946;  Desmedt  and 
Monaco,  1961;  Fex,  1962).  This  suppression  of  neural  response  may 
occur  at: 

1.  individual  afferent  fibers  of  inner  hair  cells 
(post-synaptic),  and 


2.  the  entire  output  of  individual  outer  hair  cells 
(pre-synaptic)  (Spoendlin,  1977). 

Fex  (1967)  proposed  a  network  describing  the  afferent  and  efferent 
pathways  involved  with  each  cochlea.  From  this  morphology,  a 
hypothetical  feedback  mechanism  may  be  proposed  (see  Figure  1) . 

The  ascending  pathway  courses  from  the  cochlea's  hair  cells  through 
the  VIII  nerve  to  the  cochlear  nucleus  and  then  to  the  superior 
olives  via  the  olivo-cochlear  bundle  to  the  cochlea.  Since  there 
is  a  33:1  ratio  of  afferent  to  efferent  fibers,  there  is  a  limit 
to  which  frequency  information  may  be  gated.  Scharf  (1970)  reported 
twenty-four  critical  bands  (narrow  frequency  regions  of  gated 
information)  whose  widths  equal  approximately  16  percent  of  center 
frequency  for  those  bands  above  500  Hz  (see  Table  1) . 

It  would  follow  that  any  decrement  in  the  efficiency  of 
this  feedback  mechanism  would  incur  deficits  in  the  functional 
characteristics  of  the  critical  band  mechanism  described  above. 
Several  studies  (Fex,  1967;  Dewson,  1968;  Capps  and  Ades,  1968; 
Trachiotis  and  Elliott,  1970;  Pickles,  1973)  have  examined  the 
behavioral  performance  of  animals  in  which  the  action  of  olivo¬ 
cochlear  bundle  was  blocked.  Two  overriding  observations  were 
made: 

1.  reduction  in  frequency  discrimination  ability,  and 

2.  reduction  in  the  ability  to  recognize  signals  in  a 
background  noise. 

Thus,  it  is  reasonable  to  assume  that  units  of  the  neural  inhibitory 
system  of  the  cochlea  would  be  damaged  whenever  significant  cochlear 
pathology  is  present.  In  such  a  case,  one  would  expect  to  find 
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Figure  1.  Hypothetical  Feedback  Mechanism. 
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Table  1 

Examples  of  Critical  Bandwidth 


Number 

Center 

Frequency 

(Hz) 

Critical 

Band 

(Hz) 

Lower  Cutoff 
Frequency 
(Hz) 

Upper  Cutoff 
Frequency 
(Hz) 

1 

50 

— 

— 

100 

2 

150 

100 

100 

200 

3 

250 

100 

200 

300 

4 

350 

100 

300 

400 

5 

450 

110 

400 

510 

6 

570 

120 

510 

630 

7 

700 

140 

630 

770 

8 

840 

150 

770 

920 

9 

1,000 

160 

920 

1,080 

10 

1,170 

190 

1,080 

1,270 

11 

1,370 

210 

1,270 

1,480 

12 

1,600 

240 

1,480 

1,720 

13 

1,850 

280 

1,720 

2,000 

14 

2,150 

320 

2,000 

2,320 

15 

2,500 

380 

2,320 

2,700 

16 

2,900 

450 

2,700 

3,150 

17 

3,400 

550 

3,150 

3,700 

18 

4,000 

700 

3,700 

4,400 

19 

4,800 

900 

4,400 

5,300 

20 

5,800 

1,100 

5,300 

6,400 

21 

7,000 

1,130 

6,400 

7,700 

22 

8,500 

1,800 

7,700 

9,500 

23 

10,500 

2,500 

9,500 

12,000 

24 

13,500 

3,500 

12,000 

15,500 

11 


vider-than-normal  critical  bands  due  to  the  loss  of  inhibitory 
units. 

There  is  no  reason  to  assume  that  the  central  nervous 
system  is  aware  of  the  particular  details  of  a  peripheral  pathology. 
Thus,  the  central  nervous  system  expects  to  receive  information 
from  a  full  complement  of  critical  bands.  Hence,  the  widening  of 
individual  critical  bands  involves  two  factors: 

1.  a  broadening  of  the  integration  region  which 
results  in  each  critical  band  integrating  a 
larger  area  for  a  given  signal; 

2.  retention  of  the  complete  number  of  critical 
bands  such  that  the  critical  bands  may  be 
expected  to  overlap  one  another  in  the  widened  case. 

Therefore,  the  energy  content  of  frequency  regions  common  to  more 

than  one  band  will  be  integrated  more  than  once.  This  phenomenon 

may  be  expected  to  lead  to  an  abnormally  high  perception  of  loudness 

for  a  given  magnitude  of  input  acoustic  energy.  This  phenomenon  is 

observed  psychoacoustically  among  sensorineural  hearing  impaired 

individuals,  and  is  referred  to  as  recruitment  (cf.  Fowler,  1928; 

Michael  and  Bienvenue,  1976). 

Bonding  (1979)  observed  indications  of  widened  critical 
bands  in  some  50-67  percent  of  the  sensorineural  hearing  impaired 
listeners  which  he  examined.  In  addition,  Bonding's  data  demonstrate 
that  the  width  of  the  widened  critical  band  is  independent  of  the 
magnitude  of  threshold  hearing  loss  amongst  those  sensorineurals 
with  critical  bandwidth  distortion.  This  finding  has  been 
supported  in  tests  by  Michael  and  Bienvenue  (1976);  Bienvenue  and 
Michael  (1977);  and  Bennett  et  al.,  (1978),  who  found  evidence  of 


widened  critical  bands  in  noise  exposed  patients  which  was  not 
correlated  to  threshold  shift  magnitudes.  In  fact,  some  critical 
bandwidth  distortion  occurs  in  the  absence  of  threshold  shift 
(Michael  and  Bienvenue,  1976)  . 

The  two  functional  characteristics  of  this  mechanism  suggest 
the  symptoms  that  may  appear  in  a  cochlear  pathology.  A  common 
finding  among  individuals  with  cochlear  hearing  loss  is  that  relatively 
small  amplitudes  and  remote  frequencies  of  background  noise  are 
detrimental  to  speech  perception.  This  phenomenon  may  very  well  follow 
directly  from  the  reduced  band  limiting  capabilities  of  the  widened 
critical  bands  (Michael  and  Bienvenue,  1976).  In  addition,  it  is  clear 
that  such  a  pathology,  resulting  in  a  widening  of  critical  bands, 
will  tend  to  reduce  the  number  of  discriminable  harmonics  of  a  signal 
such  as  speech.  Listeners  with  this  problems  will  be  less  able  to 
discriminate  speech  on  the  basis  of  its  harmonic  content.  Subjects 
with  cochlear  pathologies  report  speech  to  sound  "foggy"  or  "blurred," 
with  some  insisting  that  everyone  mumbles  when  they  speak  (cf.  Fowler, 
1928;  1937).  The  integrity  of  the  listener’s  critical  bands, 
therefore,  appears  to  represent  a  limiting  factor  in  their  ability 
to  perform  these  everyday  tasks. 

The  plight  of  this  pathology  lies  in  its  lack  of  remedial 
measures.  While  this  phenomenon  has  existed  amongst  the  general 
populace  for  as  long  as  the  classic  case  of  threshold  loss,  no 
present-day  audiometric  aids  are  available  that  can  alleviate  these 
symptoms.  Simply  amplifying  the  signal  to  try  to  compensate  for 
such  deficiencies  only  worsens  the  effect  by  presenting  both  the 


desired  speech  and  the  unwanted  background  noise  equally  loud. 
Rather,  a  means  to  test  for  early  signs  of  bandwidth  widening 
might  prevent  extensive  damage  and  provide  a  pool  of  knowledge  on 
which  to  base  remedial  measures.  Standard  laboratory  procedures 
for  measuring  critical  bandwidth  typically  involve  lengthy  psycho¬ 
physical  techniques  performed  on  trained  subjects  (e.g.,  Fletcher, 
1940;  Zwicker,  1954;  Greenwood,  1961;  Haggard,  1974).  These 
procedures,  while  extremely  powerful  and  accurate,  are  not  easily 
applied  to  the  clinical  setting  where  listeners  are  untrained  and 
unwilling  to  spend  the  requisite  time  listening  to  sophisticated 
signal  complexes.  Development  of  a  clinical  testing  procedure 
which  determines  critical  bandwidth  directly  and  rapidly  should 


prove  more  desirable  than  the  standard  tests 


Chapter  II 


STATEMENT  OF  THE  PROBLEM 

The  condition  of  widened  critical  bands  should  present  a 
disruption  to  the  source-path-receiver  communication  chain.  Given 
a  clearly  articulated  speech  signal  propagating  through  a  conducive 
acoustic  medium,  one  finds  a  receiver  whose  frequency  "resolving 
power"  is  insufficient  to  enable  discrimination  of  that  speech  from 
the  entire  acoustic  stimulus.  The  degree  to  which  the  listener  has 
lost  his/her  resolving  power  is  a  variable  (N  ■  factor  by  which  the 
critical  band  is  widened,  see  Figure  2a).  The  approach  chosen  to 
determine  N  involves  adding  an  interruptive  processor  to  the 
communication  chain,  whose  resolving  characteristics  are  known  and 
can  be  varied  at  will  (see  Figure  2b).  By  observing  the  response 
of  the  unknown  receiver  while  varying  the  known  disruption  of  the 
presentation,  N  may  be  determined.  If  the  frequency  resolution  of 
the  interruptive  processor  is  finer  than  the  resolving  power  of  the 
listener,  then  perception  is  limited  simply  by  the  listener’s 
critical  bandwidth.  Widening  the  bandwidth  of  frequency  resolution 
for  the  processor  should  have  no  effect  upon  the  listener's  acoustic 
analysis  of  the  signal  (compared  to  the  unprocessed  case)  until 
the  resolution  becomes  coarser  than  the  listener's  critical  bandwidth 
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(i.e.,  his/her  frequency  resolution  capacity),  at  which  point  a 
decrement  in  performance  should  be  observed.  Thus,  the  critical 
bandwidth  of  a  listener  in  this  procedure  equals  N-times  the  normal 
bandwidth  capability,  indicated  by  that  resolution  at  which  he/she 
first  exhibits  a  decrement  in  performance. 

The  purpose  of  this  study,  then,  is  to: 

1.  Generate  acoustic  stimuli  of  varying  resolutions 
equal  to  or  greater  than  the  normal  critical  band¬ 
width  using  digital  signal  processing;  specifically, 

a.  Is  it  feasible  to  produce  bandwidth  limited 
signals  such  as  those  described  in  Chapter  III? 

b.  Is  it  feasible  to  produce  a  processing  scheme 
which  allows  for  variation  of  bandwidths  from 
the  normal  critical  band  to  integer  multiples 
of  the  critical  band? 

c.  Do  these  varying  bandwidth  limited  speech 
signals  demonstrate  characteristics  observed 
in  speech  processing  by  humans  under 
pathologic  conditions  such  as  the  phenomenon 
of  recruitment? 

2.  Using  these  signals  as  stimuli  for  speech  discrimi¬ 
nation  testing,  specifically, 

a.  Do  normal  listeners,  presented  with  wider  than 
normal  bandwidth  resolution  limited  signals, 
demonstrate  performance  decrements  comparable 
to  those  seen  in  sensorineural  hearing  impaired 
listeners?  That  is,  does  a  widened  bandwidth 
condition  effectively  model  sensorineural 
hearing  impaired  speech  listening? 

b.  Do  normal  hearing  listeners  demonstrate  a 
monotonic  trend  of  decreasing  performance 
in  speech  intelligibility  as  their  allowed 
bandwidth  resolution  is  systematically 
widened?  That  is,  does  the  magnitude 

of  bandwidth  widening  effectively  model 
the  magnitude  of  impairment  in  speech 
discrimination? 


S 


Chapter  III 


DIGITAL  SIGNAL  PROCESSING  OF  SPEECH  MATERIALS 

General  Overview 

The  generation  of  bandwidth  resolution  limited  speech 
involves  the  digital  signal  processing  algorithm  shown  in  Figure  3. 
Prerecorded  stimuli  are  input  to  a  computer,  processed,  and  then 
rerecorded  onto  audio  tape  in  the  processed  form.  The  first  step 
of  this  procedure  requires  the  transformation  of  a  continuous  input 
into  a  series  of  discrete  elements.  The  input  has  a  continuously 
varying  amplitude  and  is  called  an  analog  signal.  The  digital 
signal  contains  an  array  of  discrete  values  corresponding  to  the 
input  amplitude.  The  device  used  to  transform  the  signal  from  a 
continuous  to  a  digital  mode  is  called  an  analog  to  digital  (or 
A/D)  converter.  Within  this  converter  is  a  timing  pulse  which 
beats  at  a  fixed  rate  known  as  the  sampling  rate.  As  the  analog 
signal  is  fed  into  the  A/D  converter  via  a  conventional  playback 
machine,  the  amplitude  of  the  signal  is  detected  at  every  occurrence 
of  the  timing  pulse.  The  digital  portion  of  a  computer  may  receive 
and  store  these  discrete  amplitudes  of  the  wave  in  the  form  of  a 
one-dimensional  array  of  voltages.  For  example,  a  digitized  sine 
wave  might  have  an  array  "A"  equal  to  (0,  2,  4,  6,  4,  2,  0,  -2,  -4, 
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Figure  3.  Processing  Alcorw. 
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-6,  -4,  ...)•  The  sampling  rate  commonly  used  for  audio  signals  is 
in  excess  of  20,000  samples  per  second.  Thus,  the  example  given 
above  would  be  a  typical  array  representing  a  pure  tone  at  or  above 
2000  Hz  (depending  upon  the  precise  sampling  rate). 

Once  the  word  list  is  digitized  and  stored  as  a  time  domain 
array  within  a  digital  computer,  the  processing  scheme  may  be 
initiated  by  the  software  program.  The  data  are  then  converted 
in  small  time  increments  into  the  frequency  domain  by  what  is 
known  as  a  Fast  Fourier  Transform. 

Tie  Fast  Fourier  Transform 

A  convenient  and  precise  method  for  analyzing  audio  signals 
Involves  delineation  by  sums  of  sinusoids  or  complex  exponentials. 
Commonly  called  Fourier  representations ,  they  provide  an  inherently 
superior  tool  to  signal  processing  for  two  fundamental  reasons. 
First,  a  linear  system's  response  may  be  easily  determined  from 
a  superposition  of  sinusoids  or  complex  exponentials.  Secondly, 
the  Fourier  representation  often  reveals  properties  of  a  signal  that 
would  otherwise  be  less  evident  (Rabiner  and  Schafer,  1978). 

Early  models  for  speech  production  of  steady  state  vowels 
or  fricatives,  for  example,  all  involved  a  linear  system  excited 
by  either  a  periodic  or  random  source.  Consequently,  Fourier 
analysis  was  utilized  traditionally  in  the  evaluation  of  such 
spectra.  More  recently,  however,  speech  has  been  viewed  as  a 
much  more  dynamically  complicated  waveform  (Ladefoged,  1962; 

Curtis,  1968;  Minifie  et  al,  1973).  The  combined  transient. 
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random  and  periodic  nature  of  a  speech  signal  induces  marked  changes 
in  amplitude  with  time,  violating  the  steady-state  requirements  of  a 
standard  Fourier  representation.  Instead,  a  short-time  analysis 
principle  applied  to  the  Fourier  method  has  been  found  to  be  a  valid 
approach  to  speech  processing  (Rabiner  and  Schafer,  1978).  These 
authors  found  that  a  steady  state  assumption  for  the  spectral 
properties  of  speech  is  valid  for  time  intervals  on  the  order  of 
10-30  msec.  In  a  review  of  this  time-varying  technique,  the 
application  of  its  principles  to  fast  computation  algorithms  for 
discrete  Fourier  analysis  (FFT  algorithms)  will  be  demonstrated. 

Classical  Fourier  analysis  of  spectra  has  two  basic 
approaches.  For  purely  periodic  waveforms,  one  determines  its 
Fourier  Series  [see  Equation  1]: 


uu 

X(t)  =  a  +  /  [a  cos(mo)  t)  +  b  sin(rno)  t)  ]  ,  (1) 

o  /  »  m  o  ra  o  '  7 
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where:  t  =  time, 

T  =  the  period  of  X(t) , 


wo  =  2tt/T, 
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m  |  o 


and 


■7 


b  =  2/T  /  X(t)sin(mco  t)dt 
ml  o 


- ^ 


d 


21 


By  contrast,  pulse-like  waveforms  are  analyzed  by  evaluation  of  its 
Fourier  Transform  [see  Equation  2] . 


CO 

/ 


x(f)  =  /  X(f)exp(-j27Tf t)dt  or  X(t) 


w 

-/■ 


(t)exp(-j2irft)df  .  (2) 


Underlying  the  Fourier  Series  method  is  the  following  notion: 
the  elemental  periodic  waveform  is  a  sinusoid  of  the  form: 

x(t)  =  Acos(2trft  -  0). 

Further,  all  periodic  waveforms  are  consequently  comprised  of  some 
unique  summation  of  sinusoids.  Each  of  the  summation  terms  exists 
at  the  discrete  frequencies  given  by  Each  term  is  also 

harmonically  related  to  the  fundamental,  1/T,  by  the  index  m.  The 
two  parameters  that  define  that  set  are  the  amplitude  spectrum  and 
the  phase  spectrum.  Whenever  these  two  parameters  are  evaluated, 
whether  electrically,  mechanically,  or  mathematically,  the  process 
said  to  be  occurring  is  called  spectral  analysis. 

The  Fourier  Transform,  on  the  other  hand,  defines  x(f)  as  a 
continuous  function  of  frequency.  There  is  no  index,  m,  as  in  the 
Fourier  Series,  which  would  have  indicated  a  dependence  upon  discrete 
frequencies.  Aperiodic,  or  pulse-like  waveforms  must  have  their 
complete  time  history  integrated  to  determine  the  corresponding 
frequency  composition. 

While  analyzing  speech,  however,  one  finds  a  mixture  of  both 
periodic  and  aperiodic  waveforms  (Minifie  et  al.,  1973);  neither 
method  alone  is  complete.  Rather,  a  Discrete  Fourier  Transform  (DFT) 
capitalizes  on  the  discrete  nature  of  the  waveform's  amplitude  to 
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enable  provision  of  spectral  information  regardless  of  periodicity 
[see  Equation  3] : 

M-l 

(kAf )  -  (1/MAt)  y^x(mAt)exp(-j2km/M)  .  (3) 

m=0 

where:  t  =  time, 

T  =  time  interval  of  sampling, 

At  =  time  sampling  spacing, 

Af  =  frequency  sampling  interval  =  1/T  =  1/Mt  , 

M  =  number  of  samples  in  T, 
m  =  time  index  (0,  1,  2,  ...,  M-l)  , 

and 

k  =  frequency  index  (0,  1,  2,...,  M-l)  , 

If  the  source  function  x(mAt)  repeats  itself  with  time,  the  evaluation 
occurs  as  it  would  in  a  Fourier  Series  computation.  In  the  case  of 
a  transient  function,  the  array  of  distinct  amplitude  values 
capacitates  a  direct  summation  of  the  complex  Fourier  Transform.  It 
should  be  noted  that  computation  algorithms  involving  a  series 
summation  are  much  more  efficiently  realized  by  a  computer  than  are 
formal  evaluations  of  integrals.  Thus  the  DFT,  which  serves  as  the 
basic  algorithm  of  a  Fast  Fourier  Transform  (FFT) ,  efficiently 
performs  spectral  analysis  of  speech  signals,  given  adherence  to 
certain  necessary  criteria,  described  below. 

The  main  requirement  for  using  the  DFT  is  that  the  digitized 
speech  waveform  must  satisfy  the  Nyquist  sampling  criterion;  the 
sampling  rate  should  be  at  least  twice  as  great  as  the  highest 
frequency  in  the  waveform  sampled  (Rabiner  and  Schafer,  1978). 
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Sampling  at  twice  the  highest  frequency  contained  in  the  input 
provides  greater  than  two  samples  for  each  fundamental  waveform; 
this  insures  proper  coding  of  the  signal's  frequency.  Violation 
of  this  rule  leads  to  the  phenomenon  of  aliasing,  in  which  high 
frequency  amplitude  values  are  confused  as  low  frequency  information 
(see  Figure  4) .  Foldover  is  a  term  which  describes  the  magnitude 
of  frequency  displacement  error  induced  by  the  aliasing  phenomenon. 
That  is,  half  the  sampling  frequency  serves  as  a  pivot  frequency 
for  aliasing  in  that  the  low  frequency  alias  occurs  at  a  frequency 
as  far  below  the  pivot  frequency  as  the  high  frequency  component 
is  above  the  pivot  frequency.  For  example,  if  a  sampling  rate  of 
20  kHz  is  used  for  speech  (yielding  a  pivot  frequency  of  10  kHz) , 
any  high  frequency  component  at  15  kHz  is  "folded  down"  to  become  a 
5000  Hz  low  frequency  alias,  yielding  an  inaccurate  spectrum  (see 
Figure  5). 

Another  important  consideration  in  the  computation  of  the 
DFT  within  a  software  program  is  the  time  necessary  to  complete  it. 
Cooley  and  Tukey  (1965)  found  a  significant  reduction  in  the  number 
of  complex  additions  and  multiplications  needed  for  this  transform 
whenever  the  number  of  samples  chosen  for  each  computation  equaled 
a  power  of  2  (i.e.,  when  M  =  2°).  In  addition,  the  two  indices 
involved  in  the  compilation  always  have  a  value  of  0  or  1  for 
K  =  2n,  a  feature  exploited  by  certain  FFT  subroutines  to  gain 
additional  time  advantage  (Singleton,  1969). 

An  elementary  property  of  the  Discrete  Fourier  Transform  is 
that  it  is  a  linear  operation  (Cooley  et  al.,  1969).  Stated 
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mathematically ,  X^(kAf)  x(mAt),  it  indicates  the  validity  in 
performing  an  inverse  DFT.  Since  this  research  manipulates 
spectra  while  in  the  frequency  domain,  a  viable  method  to  gain 
access  to  and  from  that  realm  would  be  a  necessary  and  sufficient 
requirement.  Thus,  the  linearity  of  the  DFT  provides  the  symmetrical 
tool  upon  which  such  processing  as  digital  filtering  depends. 

The  FFT  is  the  functional  software  realization  of  the 
Discrete  Fourier  Transform.  Used  as  a  subroutine,  it  makes  available 
(in  a  forward  transform)  arrays  corresponding  to  the  real  and 
imaginary  componenets  of  a  spectrum's  amplitudes.  Conversion  of 
these  values  to  polar  form  yields  one  magnitude  for  each  frequency 
array  element.  There  is  a  fixed  frequency  interval  between  the 
source  values  for  each  array  element;  for  example,  array  element 
number  one  might  correspond  to  the  amplitude  of  that  instantaneous 
signal  at  70  Hz,  while  array  element  number  two  might  correspond 
to  the  instantaneous  140  Hz  amplitude,  etc.  The  number  of  frequency 
elements  depends  on  the  sampling  rate  used  to  initially  digitize  the 
waveform,  and  the  number  of  samples  (time  segment  size)  used  in  the 
FFT  process.  Sampling  rates  are  generally  in  excess  of 
20,000  samples/second  in  order  to  satisfy  the  Nyquist  criterion  for 
the  primary  audio  range  (i.e.,  below  10,000  Hz),  while  time  segments 
on  the  order  of  10-30  msec  are  taken  sequentially  to  approximate 
the  steady  state  condition  described  earlier  (cf.  Rabiner  and 
Schafer,  1978). 

Note  that  this  information  is  stored  independent  of 
specific  frequency  data.  The  affiliation  of  a  voltage  value  with  a 
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particular  frequency  is  an  arbitrary  component  of  the  output 
process  and  this  stored  array  of  voltages  is  independent  of 
frequency  information  prior  to  output  processing.  Thus,  the 
term  "filtering"  takes  on  a  new  meaning  in  the  digital  mode. 

Instead  of  running  the  signal  through  a  relatively  coarse  analog 
filter,  each  instantaneous  spectral  array  may  be  modified  by  simply 
specifying  the  energy  content  between  predetermined  frequency 
limits  (see  Figure  6).  The  slopes  on  digital  "filters"  are  nearly 
infinite  and  permit  the  generation  of  tightly  tuned,  nonoverlapping 
band-pass  filtering  assignments  like  those  found  in  the  normal 
auditory  periphery  (cf.  Scharf,  1970). 

The  processed  spectra  were  made  using  the  frequency  limits 
recommended  by  Scharf  (1970)  and  reported  above  in  Table  1.  The 
discrete  frequency  amplitudes  that  fall  within  each  bandwidth 
are  averaged;  each  of  the  discrete  amplitudes  of  that  band  are 
then  set  equal  to  this  r.m.s.  value,  limiting  the  resolution  allowed 
to  the  preselected  bandwidth  for  that  time  segment  (see  Figure  6). 
The  bandwidths  shown  in  Table  1  give  the  limits  for  a  normal 
critical  bandwidth  (i.e.,  CB  =  IX).  Coarser  filtering  schemes  are 
realized  by  multiplying  the  bandwidths  by  a  chosen  integer  value 
(retaining  the  original  center  frequency) ,  and  averaging  the 
amplitudes  contained  within  these  widened  limits. 

Once  the  frequency  assignments  have  been  made,  the  new 
array  of  spectral  magnitudes  are  converted  to  rectangular  form  and 
placed  in  a  call  to  an  inverse  FFT. 
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The  processed  speech  segment,  now  back,  in  the  time  domain, 
is  stored  in  a  new  array  to  await  output.  The  loop  (involving  a 
conversion  to  the  frequency  domain,  the  implementation  of  frequency 
assignments  and  subsequent  call  to  inverse  FFT)  continues  until  all 
tine  segments  have  been  processed. 

The  Smoothing  Process 

This  procedure  of  taking  the  speech  signal  "a  slice  at  a 
time"  for  processing  takes  advantage  of  the  discrete  nature  of  the 
stored  signal.  Simply  recompiling  the  string  of  processed  segments 
assumes  that  the  envelope  of  each  time  slice  does  not  differ 
drastically  from  what  it  was  before  processing.  This  has  not  been 
found  to  be  true  in  practice,  however.  In  fact,  substantial  noise 
appearing  in  the  output  of  such  processing  may  result  directly 
from  this  practice.  Consider,  for  example,  the  demarcation  point 
"A"  in  Figure  7a,  indicating  where  one  time  segment  ends  (array 
point  19)  and  another  begins  (array  point  20).  After  processing 
(see  Figure  7b) ,  the  relative  amplitudes  across  that  juncture  are 
significantly  disparate  from  one  another  due  to  the  composite  spectral 
changes  made  within  each  segment.  Analog  reproduction  devices 
(especially  earphones  and  loudspeakers)  are  unable  to  accurately 
transduce  such  a  jump  in  amplitude.  The  resultant  acoustic  output 
at  such  a  point  is  a  transient  "pop."  If,  for  example,  the  time 
segments  are  each  15  msec  long,  then  one  transient  would  occur  every 
15  msec.  This  translates  into  a  67  Hz  "buzz"  signal  which  modulates 
the  entire  acoustic  output,  distorting  its  spectral  content. 


To  rectify  this  inherent  situation,  a  software  procedure 
was  composed  which  will  henceforth  be  referred  to  as  "smoothing." 

The  technique  basically  involves  an  isolation  procedure  to  prevent 
significant  envelope  changes  from  occurring  across  each  processed 
time  segment  boundary.  The  first  call  to  FFT  (henceforth  known  as  a 
"pass")  sends  a  specific  number  of  time  domain  amplitude  values  to 
be  converted  into  the  frequency  domain.  Upon  assignment  of  the 
specified  spectral  shape,  the  data  are  returned  to  the  time  domain 
via  an  inverse  FFT  call;  this  pass  is  identical  to  the  general 
procedure  described  earlier.  The  second  pass  involves  the  same 
amount  of  array  points  in  the  FFT  call  and  in  the  first  pass,  however, 
now  the  first  10  percent  of  the  points  are  the  same  array  points 
as  the  last  10  percent  of  the  first  pass’  call  to  FFT.  For  example, 
if  the  first  pass  sends  array  points  1-256  to  FFT,  pass  number  2  would 
send  array  points  230-486  tp  FFT.  Further,  pass  number  3  would  send 
array  points  460-716,  and  so  forth.  This  repetition  between  the  values 
at  the  start  and  finish  of  each  array  isolates  the  boundary  of  each 
segment  from  large  envelope  discontinuities.  After  the  last  pass  has 
been  completed,  the  entire  string  of  processed  segments  are  rewritten 
to  an  output  array  by  taking  only  the  first  90  percent  of  each  segment 
(except  for  the  final  pass,  taken  in  its  entirety).  This  accomplishes 
two  goals:  the  repetition  of  small  time  sectors  is  edited,  while 
a  more  nearly  continuous  envelope  change  across  the  boundry  is 
approximated.  The  signal's  amplitude  vs  time  history  is  still 
discrete;  however,  these  amplitudes  now  vary  across  the  segment 
boundaries  with  inter-segment  smoothness.  Hence,  the  smoothing 
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procedure  adjusts  the  precise  features  of  processing-induced  transients 
in  the  signal's  envelope  on  a  software  level,  such  that  auditorily 
perceptual  "pops"  are  eliminated  from  the  output  while  .retaining  the 
data  in  digital  form. 

Note  that  these  manipulations  (filtering  assignments  and 
smoothing  procedures)  all  occur  outside  of  the  signal's  real  time. 

This  characteristic  offers  several  unique  advantages.  First,  a 
high  degree  of  precision  is  achieved  during  processing  of  the  signal. 
Digital  editing  and  precise  spectral  shaping  are  examples  where  this 
feature  excels.  Secondly,  the  number  of  different  modifications 
greatly  increases  when  the  signal  is  available  as  discrete  quanta 
outside  of  real  time.  Since  the  entire  duration  of  the  signal  is 
accessible  as  a  quantified  whole,  all  dimensions  of  the  input  may  be 
simultaneously  manipulated.  Finally,  iterative  schema  may  be 
conducted  utilizing  the  speed  of  the  computer's  hardware  to  analyze 
different  combinations  of  precise  modifications.  In  many  cases  the 
experimenter  does  not  have  foreknowledge  of  the  exact  combinations 
needed  to  attain  a  specific  output.  A  guessing  procedure  in  real 
time  is  inherently  limited  by  the  need  to  completely  process  a 
signal  each  time  a  solution  is  tried.  In  the  digital  mode,  however, 
the  desired  output  is  returned  in  one  step  since  the  iterations 
occur  within  the  execution  of  the  software  program.  Thus,  the 
non-real  time  nature  of  digital  signal  processing  offers  greater 
opportunity  for  precise  signal  modification  than  conventional  analog 
filters  and  modulators. 


Chapter  IV 


PROCEDURES 


Subjects 

Twenty  subjects,  ten  female  and  ten  male,  participated  in  the 
speech  discrimination  test.  Ages  ranged  from  nineteen  to  twenty-five 
years  and  were  selected  from  the  student  population  at  the  University. 
Subjects  were  screened  for  normal  hearing,  and  those  with  thresholds 
greater  than  20  dB  at  any  frequency  from  250  to  8000  Hz  in  octave 
intervals  in  both  ears  were  eliminated.  The  right  ear  was  the  test 
ear  for  all  subjects,  unless  the  left  ear  showed  the  only  normal 
sensitivity.  All  subjects  were  paid  an  hourly  wage,  the  amount 
determined  by  the  current  going  rate  for  experimental  subjects  at  the 
University. 

Equipment 

The  processed  signals  described  in  the  previous  chapter  were 
generated  using  a  hybrid  computer  at  The  Pennsylvanj a  State  University , 
University  Park,  Pennsylvania.  The  system  is  comprised  of  an  EAI 
(Electronic  Associates  Incorporated)  Model  680  analog  computer 
interfaced  with  a  DEC  (Digital  Equipment  Corporation)  digital  computer. 
Model  PDP-10.  The  A/D  and  D/A  conversions  were  both  performed  at  a 
rate  of  20,000  samples/sccond .  The  audio  output  was  recorded  onto 
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magnetic  tape  via  a  Crown  Model  BP824  one-quarter  inch  tape  recorder 
at  seven-and-one-half  inches  per  second  (i.p.s.). 

The  discrimination  tasks  were  performed  using  the  apparatus 
diagrammed  in  Figure  8,  including  an  Ampex  Model  AG-440B  one-quarter 
inch  tape  recorder,  a  Maico  Model  MA-18  audiometer  calibrated  to 
ANSI  1969  standards,  and  a  TDH-39  earphone  fitted  with  an  MX-41/AR 
cushion.  The  tests  were  performed  in  a  Suttle  Corporation  Model 
B1  acoustically  isolated  quiet  room. 

Taped  Stimulus  Materials 

A  clinical  audiometric  word  list  was  required  as  the  input 
audio  material  to  be  processed.  Northwestern's  NU//6  word  list  was 
found  to  be  most  desirable,  since  it  includes  CCNC  (consonant- 
consonant-nucleus-consonant)  sounds  as  opposed  to  only  CCVC 
(consonant-consonant-vowel-consonant)  sounds  (cf.  Tillman  and 
Carhart,  1963).  In  other  words,  it  contains  vowel  sound  combinations, 
i.e.,  nucleii  such  as  the  /a/  with  Ac/  in  the  word  "boil",  and  the 
/&/  with  /A/  in  the  word  "bite."  These  nucleii  occur  frequently  in 
spoken  English  and  a  word  list  which  includes  these  is  especially 
representative  of  the  variety  of  sounds  naturally  occurring  in  the 
language . 

The  tapes  generated  have  three  frequency  resolutions  above 
50L  a  bandwidth  equal  to  the  resolution  of  the  normal  critical 

bandwidth  (IX);  three  times  the  normal  critical  bandwidth  (3X) ;  and 
seven  times  normal  (7X).  The  effect  of  the  processing  may  be  seen 
visually  in  Figures  9»  10,  11,  and  12,  including  a  comparison  plot 
of  the  spectrum  vs  time  for  an  unprocessed  item. 


Figure  8.  Experimental  apparatus. 
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Method 

All  subjects  read  and  signed  an  informed  consent  document, 
which  contained  an  explanation  of  the  purpose  and  procedure  of  the 
study  as  well  as  an  assurance  of  confidentiality  of  the  data  with 
regard  to  their  identity.  It  was  explained  that  the  test  basically 
involved  listening  to  a  standard  clinical  speech  discrimination 
word  list,  which  had  been  modified  by  a  novel  computer  manipulation 
technique. 

After  the  screening  procedure  (performed  in  the  quiet  room) , 
the  subjects  were  presented  with  the  three  fifty-word  lists  in  the 
following  resolution  order:  7X,  3X,  IX.  Since  the  same  word  list 
was  used  for  each  allowed  resolution  bandwidth,  this  sequence  was 
chosen  in  order  to  minimize  learning  effects.  The  signals  reached 
the  earphone  at  a  level  of  50  dB  HL  and  a  signal-to-noise  ratio  of 
+30  dB.  White  noise  was  utilized  as  the  masking  source.  Subjects 
were  provided  with  three  separate  answer  sheets  and  asked  to  write 
down  the  word  they  felt  was  said,  guessing  when  necessary.  Two 
points  were  scored  for  each  correct  identification,  zero  for  an 
incorrect  or  blank  answer.  Each  subject  thus  had  three  percentage 
scores,  one  for  each  list. 


» 


Chapter  V 


RESULTS 

By  comparing  the  plots  of  various  processed  phrases  (see 
Figures  9,  10,  11,  12),  the  effect  of  the  processing  may  be  seen 
visually.  The  reader  should  keep  in  mind  that  in  the  two  pathologic 
conditions  (3X  and  7X)  the  bands  overlap  each  other  since  those  in 
the  IX  case  are  edge  to  edge.  This  overlap  of  bands  induces  more  area 
to  each  spectral  time  segment  (compare  Figure  9  with  Figure  12) ,  since 
those  discrete  energy  regions  are  integrated  more  than  once.  Whereas 
the  computer  interprets  this  effect  as  adding  more  amplitude  area  to 
a  three-dimensional  array,  the  auditory  system  perceives  this  as  a 
loudness  increase.  In  addition,  the  peaks  of  the  phrase  "Say  the 
word,  'date'"  are  more  noticeably  rounded  in  the  3X  and  7X  filtering 
conditions  (Figures  11  and  12)  than  for  the  unprocessed  case 
(Figure  9) ,  indicating  the  reduced  resolution  for  the  widened 
critical  band  conditions. 

The  group  means  for  the  discrimination  tests  are  given  in 
Table  2.  The  cell  means  represent  the  average  discrimination  score 
attained  during  each  bandwidth-controlled  presentation.  An  analysis 
of  variance  was  performed  on  the  discrimination  data  from  this  simple 
randomized  design  (cf.  Lindquist,  1953).  The  results  are  summarized 
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Table  2 

Mean  Discrimination  Scores  for  Several 
Bandwidth  Resolutions 


Bandwidth  Resolution 
IX  3X  7X 


Mean  Discrimination  Score 


90.5 


85.1 


56.8 


43 


in  Table  3.  The  analysis  of  variance  indicated  significant  differences 
among  the  bandwidth  conditions.  The  Tukey  Multiple  T-Test  [a  wholly 
significant  difference  test  (cf.  Glass  and  Stanley,  1970)’  demonstrated 
a  significant,  monotonic,  decreasing  trend  as  bandwidth  conditions 
were  increased  from  IX  through  3X  to  7X.  That  is: 

a.  the  3X  condition  scores  were  significantly  lower 
than  the  IX  condition  scores,  and 

b.  the  7X  condition  scores  were  significantly  lower 
than  the  3X  condition  scores  (and  the  IX  condition 
scores) . 
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Table  3 


Analysis  of  Variance  Summary  Table 


Source 

df 

ss 

ms 

Treatments 

2 

17,876 

8938* 

Within  Groups 

57 

513 

9 

Total 

59 

18,389 

*significant  using  F-test  (cf.  Lindquist,  Chapter  3,  1953) 
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Chapter  VI 


DISCUSSION  AND  CONCLUSIONS 

The  peripheral  auditory  system  performs  a  preliminary 
place-specific  frequency  analysis  of  incoming  acoustic  signals. 

An  efferent  feedback  loop  pathway  carries  out  a  selective  inhibition 
of  frequency  specific  afferent  fibers.  The  limit  to  which  frequency 
information  may  be  gated  is  called  the  critical  band  and  has  been 
observed  in  such  psychoacoustic  contexts  as  masking,  loudness,  and 
musical  consonance.  More  importantly,  the  critical  band  mechanism 
performs  noise  band  limiting  and  harmonic  discrimination,  both  of 
which  are  crucial  for  the  correct  perception  of  such  complex  acoustic 
stimuli  as  speech.  Cochlear  pathologies  can  affect  the  integrity 
of  the  critical  bandwidth  mechanism,  which  in  turn  can  incur  deficits 
in  these  functional  characteristics  due  to  widening  and  overlapping 
of  the  critical  bands.  The  effect  on  the  listener  is  one  of  resolution 
loss,  in  which  their  frequency  resolving  power  is  insufficient  to 
enable  discrimination  of  speech  from  an  entire  acoustic  stimulus.  To 
determine  the  unknown  degree  to  which  such  a  pathology  has  manifested 
itself,  this  preliminary  research  focused  on  a  deductive  approach, 
whereby  the  bandwidth  resolution  of  a  presented  speech  stimulus  was 
controlled  as  the  perceptual  response  was  monitored. 


To  generate  such  bandwidth  controlled  stimuli,  a  digital 


signal  processing  scheme  was  composed,  taking  advantage  of  the 
precise  and  diverse  signal  modification  capabilities  of  a  discrete 
system.  Such  orocessing  algorithms  are  not  without  usage  requirements 
and  inherent  processing  limitations,  which  are  reflected  in  the 
presence  of  the  hyquist  criterion,  time  segment  rules,  and  the 
smoothing  process.  Even  with  these  requisites,  the  digital  approach 
offers  greater  signal  processing  opportunities  than  conventional 
analog  filters  and  modulators. 

The  use  of  resolution-limited  tapes  or.  subjects  with  normal 
hearing  yields  preliminary  data  by  which  such  a  processing  scheme 
may  be  evaluated.  In  an  attempt  to  avoid  ceiling  effects,  signals 
were  presented  at  a  signai-to-noise  ratio  of  +30  dB.  The  subject 
performances  for  the  IX  condition  were  not  significantly  different 
from  100  percent.  This  result  indicates  that  the  signal- ; o-noise 
condition  used  in  this  study  was  not  rigorous  enough  to  eliminate 
ceiling  effects  under  the  narrowest  resolution  condition. 

The  performance  of  the  listeners  for  the  3X  and  7X  condition 
was  significantly  poorer  than  their  discrimination  performance  at 
the  IX  condition.  Preliminary  item  comparison  reveals  that  the 
errors  of  the  normal  listeners  at  the  7X  condition  are  similar  to 
those  of  sensorineural  listeners  with  finer  resolution  conditions 
(Bienvenue  et  al.,  1980).  This  trend  is  in  conformance  to  the 
expected  performance  on  these  tests;  however,  this  must  be  viewed 
as  a  preliminary  conclusion  since  this  trend  is  based  upon  the 
slopes  given  only  by  two  pathologic  resolution  discrimination  scores. 
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Further  support  for  this  conclusion  stems  from  observation  of  a 
phenomenon  occurring  in  the  test  signals  as  a  result  of  processing. 

The  reader  will  recall  that  the  amplitude  area  of  the  array 
under  pathologic  conditions  (i.e.,  3X  and  7X  cases)  is  increased 
compared  to  that  for  the  normal  condition  (i.e.,  IX  case).  This 
phenomenon  was  shown  to  result  from  the  integration  of  energy  more 
than  once  when  that  energy  was  located  at  a  frequency  which  was 
common  to  more  than  one  band  in  the  widened  critical  band  case. 
Whereas  the  computer  interprets  this  effect  as  adding  more  amplitude 
area  to  a  three-dimensional  array,  the  auditory  system  perceives 
this  as  a  loudness  increase.  This  abnormal  perception  of  a  loudness 
increase  for  a  given  input  amplitude  is  akin  to  the  observed 
phenomenon  of  recruitment.  Thus,  there  are  two  inherent  features 
arising  from  this  processing  scheme  that  systematically  model 
phenomenon  that  are  known  to  occur  in  the  sensorineural  hearing 
impaired  population;  the  processing  scheme  thereby  warrants  further 
study  as  a  model  of  auditory  processing  for  speech. 

Further  research  in  this  area  is  certainly  warranted  and 
should  include: 

1.  More  rigorous  signal-to-noiso  ratios  when  testing 
normal  subjects  to  minimise  ceiling  effects; 

2.  The  use  of  sensorineural  hearing-impaired  subjects 
whose  bandwidth  distortion  is  determined  beforehand 
by  other  psychophvsic.il  techniques  and  compared  to 
the  processing  condition  at  which  they  first  exhibit 
a  detriment  in  performance; 

3.  Inclusion  of  more  bandwidth  conditions  (2X  and  5X, 
for  example)  to  yield  a  more  accurate  determination 
of  performance  trends. 
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Summary 


This  research  has  been  a  true  interdisciplinary  endeavor, 
drawing  from  such  schools  of  thought  as  engineering,  physiology, 
physics,  mathematics,  and  psychology.  In  conclusion,  this  thesis 
has  demonstrated: 

1.  the  feasibility  of  generating  speech  signals  of 
varying  bandwidth  resolutions  equal  to  and  greater 
than  the  normal  critical  bandwidth  using  a 

novel  digital  signal  processing  algorithm, 
specifically, 

a.  bandwidth  resolution  limited  signals 
such  as  those  described  in  Chapter  III 
have  been  produced; 

b.  the  processing  scheme  allows  for  variation 
of  bandwidths  from  the  normal  critical 
band  to  integer  multiples  of  the  critical 
band ; 

c.  the  stored  array  of  processed  speech  signals 
demonstrates  an  abnormal  growth  of  amplitude 
in  the  pathological  cases  (3X  and  7X) ;  a 
phenomenon  akin  to  the  recruitment  seen 

in  sensorineural  hearing  loss; 

2.  that  normal  listeners  hearing  the  3X  and  7X  lists 
demonstrate  decremental  performance  in  speech 
intelligibility,  specifically, 

a.  speech  intelligibility  scores  for  a  pathologic 
processing  scheme  equivalent  to  a  pathologic 
listening  condition  (3X  or  7X)  are  comparable 
to  scores  obtained  by  a  sensorineural  hearing 
impaired  listener  presented  with  unprocessed, 
equivalent  word  lists; 

b,  as  the  bandwidth  resolution  allowed  to  the 
normal  listeners  was  systematically  widened, 
the  intelligibility  scores  showed  a  monotonic 
decreasing  trend; 

3.  that  the  proposed  processing  scheme  is  a  reasonable 
approximation  to  the  modeling  of  hearing  for  the 
purposes  of  speech  intelligibility  in  both  the  normal 
and  pathologic  cases ,  and  that  this  approach  warrants 
further  study  and  development. 
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